11 research outputs found

    Statistical physics, mixtures of distributions, and the EM algorithm

    Get PDF
    We show that there are strong relationships between approaches to optmization and learning based on statistical physics or mixtures of experts. In particular, the EM algorithm can be interpreted as converging either to a local maximum of the mixtures model or to a saddle point solution to the statistical physics system. An advantage of the statistical physics approach is that it naturally gives rise to a heuristic continuation method, deterministic annealing, for finding good solutions

    Numerical Simulations of Lattice QCD

    Get PDF
    Numerical methods are used to investigate some of the non-perturbative properties of lattice QCD. With the aid of Monte Carlo techniques based on the canonical ensemble, we calculate the QCD potential between a pair of heavy quarks, in the quenched approximation (no dynamical quarks). We find that the potential exhibits a linear dependence on distance at distances of the order of a fermi, which is consistent with the expected confining property of QCD. At smaller distances, we observe that the potential follows a 1/R type behaviour. We also compute the mass of the 0++ glueball for the SU(3) gauge group. We implement several statistical improvements in this calculation, in order to extract the mass reliably from the Monte Carlo simulations. We obtain a mass value of ≈1400 MeV for this glue ball state (in the quenched approximation). Finally, we use a numerical method, called the "demon" method, which is based upon the microcanonical ensemble, to measure the flow of lattice actions for the group SU(2) under renormalisation transformations generated by the Monte Carlo Renormalisation Group technique. We find that the demon method is ideally suited to the problem of tracking these renormalisation flows. Using the method, we arc able to obtain an "improved" lattice action, which better describes physics near the continuum limit than the more straightforward naive actions.</p

    Mining scientific data

    No full text

    KDD for Science Data Analysis: Issues and Examples

    No full text
    The analysis of the massive data sets collected by scientific instruments demands automation as a pre-requisite to analysis. There is an urgent need to create an intermediate level at which scientists can operate effectively; isolating them from the massive sizes and harnessing human analysis capabilities to focus on tasks in which machines do not even remotely approach humans---namely, creative data analysis, theory and hypothesis formation, and drawing insights into underlying phenomena. We give an overview of the main issues in the exploitation of scientific datasets, present five case studies where KDD tools play important and enabling roles, and conclude with future challenges for data mining and KDD techniques in science data analysis

    RNA Folding on Parallel Computers - The Minimum Free Energy Structures of Complete HIV Genomes

    No full text
    Secondary structure prediction is a standard tool in the analysis of RNA sequences. The prediction of RNA secondary structures is inherently non-local. This makes the analysis of long sequences (more than 4000 nucleotides) infeasible on present-day workstations. An implementation of the secondary structure prediction algorithm for hypercube-type parallel computers allows to compute efficiently the structure of complete RNA virus genomes such as HIV-1 and other lentiviruses. 1. RNA Secondary Structures RNA structure can be broken down conceptually into a secondary structure and a tertiary structure. The secondary structure is a pattern of complementary base pairings, see Figure 1. The tertiary structure is the three-dimensional configuration of the molecule. As opposed to the protein case, the secondary structure of RNA sequences is well defined; it provides the major set of distance constraints that guide the formation of tertiary structure, and covers the dominant energy contribution ..

    Knowledge Discovery in RNA Sequence Families of HIV Using Scalable Computers

    No full text
    Secondary structure prediction is a standard tool in the analysis of RNA sequences. The prediction of RNA secondary structures is inherently non-local. This makes the analysis of long sequences (more than 4000 nucleotides) infeasible on present-day workstations. An implementation of the secondary structure prediction algorithm for hypercube-type parallel computers allows to compute efficiently the structure of complete RNA virus genomes such as HIV-1 and other lentiviruses. Introduction One of the major problems facing computational molecular biology is the fact that sequence information about important macromolecules such as proteins and RNA molecules exists in far greater quantities than information about the three-dimensional structure of these biopolymers. The development and implementation of computational methods capable of predicting structure reliably on the basis of sequence information will provide huge benefits in terms of our understanding of the relationship between seque..

    Real Time Data Mining, Management, and Visualization of GCM Output

    No full text
    The output of simulations (e.g., global circulation models) can run into terabytes. The computational cost as well as the cost of storing and retrieving model data can be quite high. Recently there have been some efforts to develop on-line visualization capabilities that can be used, for example, to monitor whether the model is behaving properly. There are however, many other uses for on-line data analysis including feature extraction, computational steering of the model, and controlled saving of model output (e.g., more frequent samples of state information under certain conditions). Each of these applications is a potential client of model output data. We present a software architecture which stresses modularity and flexibility and supports a variety of clients. Some preliminary performance numbers are given from a prototype implementation. 1 Introduction Supercomputer applications are often simulations that generate massive amounts of output. Global climate modeling is an example ..
    corecore